Mapping Snakemake

Check Validity of Sample_ids

check whether sample IDs contain '.' or not

Check Validity of RNA_types

check whether RNA types contain '.' or not

Parameter Settings

ds	ds
`data_dir`	path to `.fastq` files
`data_dir`	path to `.fastq` files
`rna_types`	map to different types of RNA indexes
`adaptor`	reads have adaptors, and software cut them off provided with sequences
`min_read_length`	filter too short reads
`genome_dir`
`max_read_length`	filter too long reads
`min_base_quality`	base quality control
`temp_dir`	store temporary files

mapping statistics

read_counts_raw

Count reads in .fastq files of raw data

wc -l < {input} | awk '{{print int($0/4)}}' > {output}

read_counts_mapped

Count reads in .bam files of mapped reads

bamtools count -in {input} > {output}

read_counts_unmapped

Count reads in .fa.gz files of mapped reads

pigz -p {threads} -d -c {input} | wc -l | awk '{{print int($0/2)}}' > {output}

summarize_read_counts

mapped_read_length

run python script to count reads length of different .bam files as outputs of sequential mapping

bin/statistics.py read_length_hist --max-length 600 -i {input} -o {output}

merge_mapped_read_length

fastqc

fastqc of raw data

parse_fastqc_data

summarize_fastqc_ipynb

summarize_fastqc_html

cutadapt

cutadapt: cutadapt removes adapter sequences from high-throughput sequencing reads.

cutadapt -a {params.adaptor} -m {params.min_read_length} --trim-n -q {min_base_quality}          --too-short-output >(pigz -c -p {threads} > {output.too_short}) -o {output.trimmed} {input}

fastq_to_fasta

Change file attributes to remove quality information

tbam_to_gbam

convert transcript coordinate BAM alignments file into a genomic coordinate BAM alignments file

rsem-tbam2gbam {params.index} {input.bam}

sort_gbam

samtools sort {input} > {output.bam}
samtools index {output.bam}

gbam_to_bedgraph

gbedgraph_to_bigwig

sort_tbam

samtools sort -T {params.temp_dir} -o {output} {input}

collect_alignment_summary_metrics

Produces a summary of alignment metrics from a SAM or BAM file.

picard CollectAlignmentSummaryMetrics I={input} O={output}

count_reads_intron

Provided with .bed file containing intron loci and other.bam file containing reads mapped to hg38, report overlaps to retrieve intron stats.

bedtools intersect -wa -s -a {input.bam} -b {input.bed} | wc -l > {output}

count_reads_promoter

Provided with .bed file containing promoter loci and other.bam file containing reads mapped to hg38, report overlaps to retrieve promoter stats.

count_reads_enhancer

Provided with .bed file containing enhancer loci and other.bam file containing reads mapped to hg38, report overlaps to retrieve enhancer stats.

map_circRNA

The software aligns unmapped reads to cicrRNA index ...

pigz -d -c other.fa.gz | bowtie2 -f -p {threads} --norc --sensitive --no-unal --un-gz circRNA.aligner.fa.gz -x circRNA - -S - | bin/preprocess.py filter_circrna_reads --filtered-file >(samtools view -b -o {output.bam_filtered}) | samtools view -b -o {output.bam}

Keys	Action
`?`	Open this help
`←`	Previous page
`→`	Next page
`s`	Search